Goto

Collaborating Authors

 generalisation ability



How does Weight Correlation Affect Generalisation Ability of Deep Neural Networks?

Neural Information Processing Systems

This paper studies the novel concept of weight correlation in deep neural networks and discusses its impact on the networks' generalisation ability. For fully-connected layers, the weight correlation is defined as the average cosine similarity between weight vectors of neurons, and for convolutional layers, the weight correlation is defined as the cosine similarity between filter matrices. Theoretically, we show that, weight correlation can, and should, be incorporated into the PAC Bayesian framework for the generalisation of neural networks, and the resulting generalisation bound is monotonic with respect to the weight correlation. We formulate a new complexity measure, which lifts the PAC Bayes measure with weight correlation, and experimentally confirm that it is able to rank the generalisation errors of a set of networks more precisely than existing measures. More importantly, we develop a new regulariser for training, and provide extensive experiments that show that the generalisation error can be greatly reduced with our novel approach.



Review for NeurIPS paper: How does Weight Correlation Affect Generalisation Ability of Deep Neural Networks?

Neural Information Processing Systems

Inspired by a PAC-Bayes risk bound for Deep Neural Nets (DNNs) with a Gaussian posterior having a covariance matrix determined by the correlation between the weight vectors within the same layer, the authors propose a weight correlation descent algorithm for regularizing DNNs. The extensive numerical experiments provide a clear evidence of the advantage of reducing the correlation between the weight vectors within the same layer. We think that this regularizer, easy to implement, can provide an alternative (or be complementary) to other currently-used regularizers such as weight decay and drop-out.


Review for NeurIPS paper: How does Weight Correlation Affect Generalisation Ability of Deep Neural Networks?

Neural Information Processing Systems

Weaknesses: * I worry that the claims about the measure being theoretically grounded are wrong, or at least misleading. The way I understand it, the paper introduces a method - WCD - which minimises weight correlation along with the loss. In order to provide performance guarantees like in Eq. (3) for this method, one would have to compute the posterior Q that WCD actually gives rise to. Instead, the paper defines a separate posterior, which is inspired by similar concepts, but essentially comes from nowhere and has no reason to be tied to WCD. I therefore find the discussion in Section 4 misleading.


How does Weight Correlation Affect Generalisation Ability of Deep Neural Networks?

Neural Information Processing Systems

This paper studies the novel concept of weight correlation in deep neural networks and discusses its impact on the networks' generalisation ability. For fully-connected layers, the weight correlation is defined as the average cosine similarity between weight vectors of neurons, and for convolutional layers, the weight correlation is defined as the cosine similarity between filter matrices. Theoretically, we show that, weight correlation can, and should, be incorporated into the PAC Bayesian framework for the generalisation of neural networks, and the resulting generalisation bound is monotonic with respect to the weight correlation. We formulate a new complexity measure, which lifts the PAC Bayes measure with weight correlation, and experimentally confirm that it is able to rank the generalisation errors of a set of networks more precisely than existing measures. More importantly, we develop a new regulariser for training, and provide extensive experiments that show that the generalisation error can be greatly reduced with our novel approach.


Graph as a feature: improving node classification with non-neural graph-aware logistic regression

Delarue, Simon, Bonald, Thomas, Viard, Tiphaine

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) and their message passing framework that leverages both structural and feature information, have become a standard method for solving graph-based machine learning problems. However, these approaches still struggle to generalise well beyond datasets that exhibit strong homophily, where nodes of the same class tend to connect. This limitation has led to the development of complex neural architectures that pose challenges in terms of efficiency and scalability. In response to these limitations, we focus on simpler and more scalable approaches and introduce Graph-aware Logistic Regression (GLR), a non-neural model designed for node classification tasks. Unlike traditional graph algorithms that use only a fraction of the information accessible to GNNs, our proposed model simultaneously leverages both node features and the relationships between entities. However instead of relying on message passing, our approach encodes each node's relationships as an additional feature vector, which is then combined with the node's self attributes. Extensive experimental results, conducted within a rigorous evaluation framework, show that our proposed GLR approach outperforms both foundational and sophisticated state-of-the-art GNN models in node classification tasks. Going beyond the traditional limited benchmarks, our experiments indicate that GLR increases generalisation ability while reaching performance gains in computation time up to two orders of magnitude compared to it best neural competitor.


Optimising Random Forest Machine Learning Algorithms for User VR Experience Prediction Based on Iterative Local Search-Sparrow Search Algorithm

Tang, Xirui, Li, Feiyang, Cao, Zinan, Yu, Qixuan, Gong, Yulu

arXiv.org Artificial Intelligence

In this paper, an improved method for VR user experience prediction is investigated by introducing a sparrow search algorithm and a random forest algorithm improved by an iterative local search-optimised sparrow search algorithm. The study firstly conducted a statistical analysis of the data, and then trained and tested using the traditional random forest model, the random forest model improved by the sparrow search algorithm, and the random forest algorithm improved based on the iterative local search-sparrow search algorithm, respectively. The results show that the traditional random forest model has a prediction accuracy of 93% on the training set but only 73.3% on the test set, which is poor in generalisation; whereas the model improved by the sparrow search algorithm has a prediction accuracy of 94% on the test set, which is improved compared with the traditional model. What is more noteworthy is that the improved model based on the iterative local search-sparrow search algorithm achieves 100% accuracy on both the training and test sets, which is significantly better than the other two methods. These research results provide new ideas and methods for VR user experience prediction, especially the improved model based on the iterative local search-sparrow search algorithm performs well and is able to more accurately predict and classify the user's VR experience. In the future, the application of this method in other fields can be further explored, and its effectiveness can be verified through real cases to promote the development of AI technology in the field of user experience.


Improved AdaBoost for Virtual Reality Experience Prediction Based on Long Short-Term Memory Network

Fan, Wenhan, Ding, Zhicheng, Huang, Ruixin, Zhou, Chang, Zhang, Xuyang

arXiv.org Artificial Intelligence

A classification prediction algorithm based on Long Short-Term Memory Network (LSTM) improved AdaBoost is used to predict virtual reality (VR) user experience. The dataset is randomly divided into training and test sets in the ratio of 7:3.During the training process, the model's loss value decreases from 0.65 to 0.31, which shows that the model gradually reduces the discrepancy between the prediction results and the actual labels, and improves the accuracy and generalisation ability.The final loss value of 0.31 indicates that the model fits the training data well, and is able to make predictions and classifications more accurately. The confusion matrix for the training set shows a total of 177 correct predictions and 52 incorrect predictions, with an accuracy of 77%, precision of 88%, recall of 77% and f1 score of 82%. The confusion matrix for the test set shows a total of 167 correct and 53 incorrect predictions with 75% accuracy, 87% precision, 57% recall and 69% f1 score. In summary, the classification prediction algorithm based on LSTM with improved AdaBoost shows good prediction ability for virtual reality user experience. This study is of great significance to enhance the application of virtual reality technology in user experience. By combining LSTM and AdaBoost algorithms, significant progress has been made in user experience prediction, which not only improves the accuracy and generalisation ability of the model, but also provides useful insights for related research in the field of virtual reality. This approach can help developers better understand user requirements, optimise virtual reality product design, and enhance user satisfaction, promoting the wide application of virtual reality technology in various fields.


A PAC-Bayesian Link Between Generalisation and Flat Minima

Haddouche, Maxime, Viallard, Paul, Simsekli, Umut, Guedj, Benjamin

arXiv.org Machine Learning

Modern machine learning usually involves predictors in the overparametrised setting (number of trained parameters greater than dataset size), and their training yield not only good performances on training data, but also good generalisation capacity. This phenomenon challenges many theoretical results, and remains an open problem. To reach a better understanding, we provide novel generalisation bounds involving gradient terms. To do so, we combine the PAC-Bayes toolbox with Poincar\'e and Log-Sobolev inequalities, avoiding an explicit dependency on dimension of the predictor space. Our results highlight the positive influence of \emph{flat minima} (being minima with a neighbourhood nearly minimising the learning problem as well) on generalisation performances, involving directly the benefits of the optimisation phase.